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Abstract 

The  research  reviewed  in  this  study  specifically  covered  3  issues: 

1.  The  design  of  "ideal  execution  architectures"  suitable  for  a  range  of  source  languages  and  host 
machine  environments. 

2.  Analysis  of  the  relation  between  source  language  program  constructs  and  execution  architectures. 

3.  Methods  of  implementing  these  "ideal"  instruction  sets. 
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Introduction 

The  basic  objective  of  this  research  is  to  find  improved  computer  architectures  which  will  allow  more 
concise  representations  of  programs,  faster  execution  of  programs  and  more  accurate  program 
representations. 

In  our  past  research,  we  have  demonstrated  techniques  to  afford  significant  improvement  in  spatial 
requirements  to  represent  programs  and  in  execution  time  to  interpret  programs.  'ITiese  techniques  are  based 
upon  creating  computer  architectures  which  are  tailored  to  a  single  high  level  language  in  which  the 
architecture  represents  only  those  objects  explicitly  mentioned  in  the  high  level  language  (a  1:1  language  to 
architecture  mapping). 

Techniques  have  been  developed,  such  as  accessing  values  by  contour  map  and  use  of 
transformationally  complete  sets  of  formats,  which  allow  the  1:1  mapping.  The  minimization  of  objects 
reduces  both  program  size  and  improves  execution  time. 

DEL  Theory 

We  have  completed  at  least  a  preliminary  version  of  a  broad  based  theory  on  the  synthesis  of  directly 
executed  languages.  The  kernel  of  this  is  a  measure  of  ideal  program  representation.  In  actuality,  this  is  a 
constructive  measure  of  program  space  and  interpretation  that  one  can  use  for  exploitation  of  alternative 
representations.  'This  measure,  while  not  necessarily  achievable,  intuitively  represents  a  clearly  superior 
program  representation  that  is  readily  definable  and  easy  to  use. 

We  call  these  measures  the  canonic  interpretive  measures  (Cl  measures)  of  a  program.  The  space 
measure  is  the  number  of  bits  of  program  size,  and  the  time  measure  is  the  number  of  instruction,  data, 
memory  and  other  actions  in  the  program  space. 

The  Cl  measure  usually  indicates  a  full  order  of  magnitude  less  space  than  the  System  360 
representation  or  other  familiar  traditional  machine  representations.  The  Cl  measures  play  an  important  role 
in  our  development  of  a  theory  of  DHL  synthesis,  since  we  essentially  proceed  in  a  step-by-step  fashion  to 
achieve  program  measures  which  arc  as  close  as  possible  to  them.  Hxccpt  for  frequency  information,  the  Cl 
measures  arc  oriented  toward  an  information  theoretic  minimum  program  representation.  In  many  ways  our 
DEL  synthesis  theory  achieves  exactly  the  Cl  measures.  However,  in  order  to  achieve  the  transformational 
completeness  property,  we  have  found  that  22  formats  are  required.  This  adds  effectively  about  5  bits  to  each 
instruction  unit  or  statement  (log2  of  22).  In  most  other  ways,  however.  Cl  measures  arc  achieved. 
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Ideal  Image  Machines  and  their  Hosts 

In  trying  to  define  an  ideal  image  machine,  one  must  first  recognize  that  the  image  machine  architect  is 
limited  by  the  high-level  languages  that  originally  represent  the  program.  If  these  are  poorly  formed, 
inefficient,  or  erroneous,  there  is  no  way  that  the  architecture,  the  interpreter,  or  even  the  compiler  can  restore 
the  initial  deficiencies.  Ihc  best  the  architect  can  do  is  to  retain  the  information  present  in  the  high-level 
language  representation  and  to  provide  this  information  in  a  form  as  concise  and  useful  as  the  interpreter  and 
the  host  require.  Moreover,  this  interpretive  hierarchy  occurs  at  many  levels  in  the  system. 

Thus,  here  we  will  derive  certain  aspects  of  HLL  program  representation  which  ultimately  determine 
(and  limit)  the  space-time  product  of  image  program  interpretations.  While  we  will  come  up  with  a  number 
of  measures  of  source  program  behavior,  we  must  remember  that  these  are  measures  and  arc  not  equally 
weighted.  Their  importance  depends  a  great  deal  on  the  type  of  host  machine  doing  the  interpretation.  These 
quantitative  measures  arc  expressed  in  architectural  terms,  so  that  for  a  particular  machine  representation,  one 
could  specify  an  ideal  size  and  interpretation  time  for  a  source  program,  for  example,  and  compare  those  ideal 
measures  to  the  achieved  size  and  interpretation  time.  Of  course,  specifying  an  ideal  machine  measure  is  a 
formidable  task  in  itself. 

Before  proceeding  we  need  to  state  some  assumptions: 

•  Measures  are  independent  of  technology.  We  arc  interested  in  comparing  logical  representations 
and  architectures,  not  in  comparing  different  machine  technologies. 

•  The  original  HLL  source  program  is  a  good  representation  of  the  original  problem;  i.e., 
optimization  is  source  to  source.  The  original  program  is  aready  optimized  to  the  degree  desired. 

•  Measures  focus  on  two  &nccts  of  program  representation:  space  to  represent  the  image  program 
and  time  to  interpret  that!  ^presentation.  Simplicity  in  generating  the  representation  is  an  implied 
necessity. 

•  The  measures  consist  of  correspondence,  size,  activity,  stability  and  distance.  The  first  two 
determine  static  program  size.  The  last  three  affect  in  varying  degrees  the  time  it  takes  to  interpret 
a  program. 

Canonic  Measures  of  Interpretation 

1.  Correspondence.  An  ideal  representation  minimizes  the  number  of  objects  to  be  interpreted  without 
disturbing  transparency  with  respect  to  the  HLL  source.  For  each  semantic  action  in  the  HLL  source 
(addition,  subtraction,  etc.),  there  is  one  instruction  in  the  ideal  representation  (one  Cl  instruction i);  for  each 
unique  name  mentioned  in  the  HLL  statement,  there  is  an  explicit  object  identification  in  the  ideal 
representation. 
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2.  Si/c.  All  objects  of  the  same  class  can  be  coded  with  variable-width  identifiers  appropriate  to  the 
language  environment  as  follows: 

•  All  variables  are  identified  by  f(log2  of  the  number  of  variables  in  the  environment)]  bits* 

•  operations  aie  identified  by  f(log2  of  the  number  of  distinct  actions  specified  in  the 
environment)]  bits,  and 

•  labels  are  identified  by  f(log2  of  the  number  of  distinct  labels  specified  in  the  environment)]  bits, 
where  the  environment  is  usually  taken  to  be  a  single  subroutine  or  procedure. 

Ideally,  the  size  measure  requires  that  coding  be  as  concise  as  possible,  i.c.,  using  f(log2  of  the  number 
of  objects  in  the  environment)]  bits.  There  arc  two  aspects  to  coding.  First,  die  objects  should  be  of  like  kind. 
Thus,  labels,  operators,  and  values  which  can  be  easily  distinguished  should  be  treated  and  coded  separately. 
Second,  coding  always  takes  place  in  the  context  of  an  environment.  The  statement  itself  can  be  an 
environment,  although  measures  of  time  (such  as  measure  4,  below)  which  count  die  number  of  environments 
interpreted  will  increase  if  such  a  small  coding  environment  is  chosen.  This  is  a  trade-off  between  two 
measures— space  and  time.  The  scope  of  the  definitions  used  in  the  HLL  usually  defines  the 
environment— that  is,  the  environment  encompasses  names  or  objects  of  the  same  scope  of  definition,  e.g., 
subroutines  as  follows: 

3.  Activity.  Ihis  measures  the  number  of  objects  interpreted.  Usually,  there  arc  two  separate  types  of 
activity— instructions  (operations)  interpreted  and  variables  accessed  from  image  storage: 

•  Let  A.  be  die  number  of  instructions  interpreted.  Then,  ideally,  A.  is  equal  to  the  number  of 
semantic  actions  specified  (dynamically)  in  the  HLL  source. 

•  Let  A^  be  the  number  of  data  references  required.  Ideally,  Ad  is  no  more  than  the  dynamic 
number  of  variables  encountered. 

4.  Stability.  This  measures  the  number  of  environments  encountered  and  other  disruptions  to  the 
ordinary  "in-line"  processing  of  objects. 

•  Let  A"  be  the  total  number  of  environments  encountered. 

e 

•  Let  S  be  the  number  of  computed  (interpreted)  objects,  e.g.,  array  elements. 

•  Let  Sb  be  the  number  of  control  actions  (branches  dynamically  interpreted). 

This  measure  is  an  extension  of  the  activity  measure  in  that  it  measures  the  more  global  types  of  activity, 
die  number  of  environments  encountered,  the  number  of  objects  whose  location  has  to  be  computed,  and, 
finally,  the  number  of  branches  or  procedural  statements  encountered.  Ideally,  dierc  is  one  intcrprctable 
action  per  HLL  action. 
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5.  Distance.  This  measures  the  "initialization"  required  in  a  program.  'Ihc  ultimate  host  machine 
might  have  an  infinite  cache,  robust  support  of  array  accessing,  and  an  unlimited  branch  target  capture  table. 
But  even  such  a  machine  would  be  limited  by  the  following  measures: 

•  Let  De  be  the  number  of  unique  environments  entered. 

•  Let  Dc  be  the  number  of  unique  objects  requiring  interpreted  definition. 

•  Let  Z)^be  the  number  of  unique  branch  targets. 

The  distance  measure  represents  the  total  number  of  unique  environments  and  unique  branches 
encountered.  It  represents  the  distance  the  program  has  transversed,  i.c.,  the  number  of  localities  it  has 
visited. 

An  Example  and  a  Comparison 

Consider  the  Pascal  example  shown  in  Table  1;  hardshujfle — a  program  for  shuffling  vector  elements 
the  hard  way — consists  of  shuffle  procedure  swapvec  and  the  main  program. 

Swapvec  interchanges  the  elements  of  two  vectors  from  the  first  element  up  to  the  parameter  limit 
Hardshujfle— the  main  program — creates  two  vectors,  identity  consisting  of  the  integers  and  sum  which 
consists  of  the  sum  of  the  integers.  For  a  variable  limit  ranging  from  /  to  10 ,  swapvec  is  called  to  interchange 
the  elements  of  the  two  created  vectors.  Finally,  the  values  of  the  vectors  are  written  out.  Table  1  shows  the 
derivation  of  the  Cl  measures  from  the  original  hardshujfle  source.  The  number  of  instructions,  static 
(column  a)  and  dynamic  (j)  correspond  to  the  number  of  source  semantic  actions  held  in  memory  and 
encountered  during  execution.  The  column  labelled  "Data  References"  (k)  is  the  dynamic  count  of  activity 
(both  local  and  main  memory)  for  data  objects.  The  number  of  syllables  interpreted  (o)  corresponds  to  the 
count  of  the  number  of  objects  (variables,  operations  and  labels)  encountered  in  each  instruction  times  its 
dynamic  weight.  The  number  of  branches  encountered  (1)  is  a  dynamic  count  of  branch  instructions,  S^, 
while  the  distance  Db  measures  the  number  of  unique  branch  targets  encountered.  Computed  data  items  (g) 
refers  in  this  example  to  array  elements.  The  dynamic  sum  of  (g)  is  Sc  (p)  while  the  number  of  unique 
occurrences  of  such  items  is  Dc  (q).  Of  course  there  arc  two  environments  in  this  little  example  (De)  and  since 
the  main  program  calls  the  subroutine  ten  times  we  have  a  total  of  1 1  dynamic  environments  (S^. 

In  computing  the  static  Cl  measures  of  this  program  we  first  must  identify  the  distinct  operands  and 
operators  in  each  of  the  environments.  For  swapvec  the  operands  arc  ah  a2f  limit ,  index,  temp ,  a  container  for 
the  final  value,  and  the  constant  /,  i.c.  7  unique  operands.  The  operators  arc  for,  array  subscript,  end  for,  and 
return— 4  unique  operators.  There  arc  also  two  labels.  Each  line  of  the  body  of  swapvec  basically  represents  a 
single  Cl  instruction;  an  exception  is 


al[index] :  —  a2\index). 

A  separate  instruction  is  required  to  compute  a2[index]  and  retrieve  its  value.  Ihis  value  will  be  used  in  the 
next  Cl  instruction  which  actually  stores  the  value  into  al[inde x].  In  computing  the  number  of  operands  per 
Cl  inst  JCtion  it  is  merely  necessary  to  follow  the  semantics  of  the  statement,  thus: 

for  index  :  =  /  to  limit  do 

has  an  index  operand,  two  operands  for  the  range  (/  and  limit)  and  an  operand  for  the  final  value.  This 
particular  statement  also  includes  an  opcode  and  a  label  (for  use  if  limit  <  1).  Most  other  statements  in  this 
routine  include  a  single  c^  aCmer  for  each  operand  mentioned  in  the  statement.  Static  program  size  for 
swapvec  can  be  computed  to  be  three  bits  per  operand  X  36  operands  +  2  bits  per  operator  X  7  operators  + 
1  bit  per  label  X  2  labels,  totaling  64  bits. 

Similarly  with  the  main  program  the  unique  operands  are  /,  the  operand  for  the  final  value,  identity ; 
sum ,  swapvec ,  and  the  constants  /,  2  and  10 ,  for  a  total  of  8  unique  operands.  The  unique  operators  ar6  array 
subscript,  for,  — ,  v-,  end  for,  call ,  write ,  and  writebu  for  a  total  of  8  unique  operators,  Ibere  arc  also  6  label 
in  the  main  program.  Static  size  can  be  computed  as  3  bits  per  operand  X  45  operands  +  3  bits  per  operator 
X  20  operators  +  3  bits  per  label  X  6  labels,  the  total  being  213  bits. 

The  hardshuffle  example  illustrates  the  rather  mechanical  nature  of  deriving  the  Cl  parameters.  First, 
the  program  is  assumed  to  be  optimized — although  hardshuffle  certainly  isn't .  Thus,  the  Cl  measures  can  be 
derived  on  a  line  by  line  basis.  Each  HLL  statement  will  take  at  least  one  Cl  instruction.  If  the  statement 
involves  a  computed  data  item  (an  indexed  array  element)  an  extra  instruction  sometimes  is  required. 
However,  most  HLL  statements  in  the  main  program  have  one  Cl  instruction  in  their  execution  architecture. 
Exceptions  include  a  statement  with  two  subscripted  variables  (line  4),  tire  write  instructions  (lines  7  and  18) 
that  use  subscripted  variables,  and  the  statement  computing  (line  13)  which  requires  4  instructions  (— , 
•f  and  subscripting).  In  computing  operands,  usually  the  count  is  simply  the  number  of  explicit  variables 
mentioned  in  the  statement.  E.g.  on  line  2  we  have:  index ,  i  and  limit.  Occasionally  the  semantics  of  an 
operation  requires  a  hidden  operand;  e.g.  (line  6)  the  end  of  a  for  requires  a  final  value  operand.  The 
statement  on  line  3  has  but  one  instruction  while  line  4  required  two  instructions.  In  column  (a)  we 
enumerate  the  number  of  instructions  per  statement,  and  column  (b)  enumerates  the  number  of  associated 
operands  per  statement.  Labels  are  not  tabulated  explicitly  but  the  sum  of  column  (a)  plus  column  (b)  plus 
implied  labels  equals  the  number  of  syllables,  column  (c),  for  that  line.  Thus  on  line  2,  we  see  a  single 
instruction  with  three  associated  operands  (index,  /,  and  limit).  There  is  an  implied  label;  thus  there  arc  five 
syllables  to  be  interpreted  for  instruction  execution.  On  line  4  it  is  assumed  that  one  instruction  computes  the 
location  of  the  indicated  source  operand  and  fetches  it.  The  second  instruction  computes  the  location  of  the 
sink  operand  and  stores  the  value  of  the  source  operand.  Thus  the  two  instructions 


6 


counts  per  statement 


(b)  a  laoueajnooo  3e^Ti  o  ininooo0,-tM  O'o'OoogSooo 

O' 

(d)  S  pa^ndinoo  gVo*  o  X2K°°OHH  a'i3000SS0C>0 


(o)  ea^qeilXs  K^o  o 


(U)  fcjatjfjji  x  *  o  o 

*0  W 


(01)  SpC9^  X  O 


..  pBJia^unooua^  rHO  o 
(T)  saqoueag  ■* 

o 

(v)  soDusaojas^So  o 

-  m 

(f)  paqnaaxa  suojqanarisui'x  3o  o 

'«0  ■  *H 


(pa^noaxa  saurpa  30  *ofl)  o  © 
(T)  aft  OTineuXa 

(q)  ^uamuojLjAug  rH  o 

sma^T  eapQ 


000 

04  n  CM 

cm  m  cm 


O  O'  to  VO 


Pr-^moppOcM 


n  O'  n 


vt  <r  r-<  <n 


o  o  o  .-1 


0000000 


c%  o  0  000000 
O'  o  o  ooiooo 

rH  rH 


000 


mom 

VO  CM  so 
rH  CM  H 


mom 
m  rH  m 


o  o  o  ,h  o 


•H  ro  m  m  ov 


O  rH  rH  rH  O' 


O  O'  5S  *H  o  O  O  O  rH 
■H  +  ^ 


0*0 

'O  O'  +7h  moo  o  o  o 

m  ro+  CM  CM  »H 


(3)  *  y-"  o  o 

'  7  pa^ndmoo 


m  O  rH  rH  fH  rH  O'  O'  O'  +  +  O  O  O  O  rH  ^ 


O  ©  O  O  O  T“i  OOOOOOOOOOOOO 


fH  CM  rH  O  O  OrHrHOrHCMOOOrHrHOOO 


s 


5 


4) 

c 

O 

■o 

e 

<0 


(g)  saqauejg  o  m 
(a)  sax4FTI^S  0 


O  O  O  rH 


OOOrHOOrHesjrHOOOrH 


•J  \D  4  fl  N  O  <t  m  <T 


sjn 

o  <*>  +  -4-  m  sr 
rH  m 


(p^sarjjjfl  0  0 
XaonD^j 


O  rH  rH  O  O  O-HiHOrHrHOOOOOOO  O 


Ptf 

£ 


Table  Is  Deriving  the  CIF  for  Hardshuffle 


System  370 
PL-1 


consist  of  a  total  of  four  operand  fields,  giving  a  total  of  six  syllables  to  be  interpreted  for  their  joint  execution. 
On  line  6  the  end  of  die  for  loop  requires  one  instruction — a  return  and  an  implied  operand  which  contains 
the  final  value  as  well  as  a  label  associated  with  die  return.  Thus  die  counts  for  the  end  of  the  procedure 
depend  upon  the  scmatics  of  die  instruction  sequence  initiated.  On  line  7,  for  example,  the  end  requires  no 
operands;  hence  only  two  syllables  arc  interpreted,  'fhe  resulting  columns  (a)  dirough  (h)  arc  weighted  by 
column  (i),  the  dynamic  count  or  number  of  times  each  line  is  executed.  When  die  respective  columns  are 
weighted  we  can  derive  the  appropriate  Cl  measures  for  that  line.  Summing  over  all  lines  we  have  the  total 
Cl  measures  for  the  program. 

Main  memory  read  and  write  references — columns  (c)  and  (d) — arise  from  computed  data  references. 
The  principle  difference  between  operand  references  (b)  and  main  memory  references — column  (c)  plus 
column  (d) — is  that  the  operands  arc  assumed  to  access  a  local  memory  if  dicy  are  local  scalars — other 
references  go  to  main  (or  global)  memory.  The  counts  in  columns  (c)  and  (d)  arc  included  in  column 
(k) — data  referencing;  dicse  give  an  indication  of  the  predictability  of  referencing  activity.  An  architecture 
which  does  not  accommodate  local  references  is  more  likely  to  encounter  memory  bandwidth  and  contention 
problems.  The  total  number  of  data  references — column  (k) — is  simply  the  number  of  operands  per 
instruction  weighted  by  the  number  of  times  that  the  instruction  is  executed.  The  sum  of  column  (k)  for  the 
program  is  830  references.  Register-oriented  architectures  as  well  as  architectures  which  make  provision  for 
rapid  access  to  local  variables  should  do  better  than  this,  but  reducing  references  below'  the  main  memory 
referencing  activity — column  (m)  plus  column  (n) —  should  be  more  difficult. 

Since  each  instruction  called  for  in  column  (a)  requires  an  op-code  syllable  and  each  operand  (d) 
requires  a  syllabic,  the  total  number  of  syllables  per  statement  (c)  is  the  sum  of  (a)  and  (b)  and  the  number  of 
labels.  Labels  are  not  tabulated  directly.  The  branches  (0  and  computed  data  item  counts  (g)  are  fairly 
obvious.  Line  15  represents  an  initial  (for)  branch,  a  call  to  swapvcc  and  a  return. 

Table  2  is  a  complete  evaluation  for  the  example  hardshujjle  for  all  of  the  Cl  measures  on  a  variety  of 
different  architectural  approaches.  A  fair  comparison  for  a  variety  of  architectures  is  a  more  formidable  task 
than  might  first  appear,  ‘ill ic  measures  are  significantly  influenced  by  compiler  strategics  and  run  time 
environments  as  well  as  the  basic  architecture  itself.  IT  us,  the  data  in  Table  2  requires  some  explanation. 

The  first  comparison  is  with  an  execution  architecture  called  Adept,  developed  at  Stanford  and  derived 
from  principles  of  minimizing  Cl  measures  while  maintaining  transparency  for  Pascal  programs.  By  using  an 
additional  format  syllabic  in  each  instruction  it  matches  most  static  and  dynamic  Cl  instruction  count 
measures,  ft  also  matches  the  Cl  measures  for  memory  activity  as  well  as  most  of  the  stability  and  distance 
measures.  Additional  syllables  per  instruction  add  about  5  bits  to  each  instruction  and  hence  account  for 


about  1 10  additional  bits  in  static  program  size.  An  additional  800  bits  of  Adept  arc  used  to  hold  addresses  for 
subscripted  variables,  array  bases,  and  other  environmental  data.  An  Adept  variable  reference  consists  of  the 
addition  of  an  environmental  pointer  to  a  variable  index  whose  container  matches  the  log2  Cl  requirement. 
Each  environment  then  will  have  its  own  environmental  pointer  and  container  width.  Some  variables  such  as 
array  elements  have  an  address  computation  before  the  element  can  be  retrieved  from  main  storage  which 

I 

contains  the  image  array.  dims  the  address  of  the  base  of  the  vector  must  be  stored  as  well  as  the  retrieved 
array  element.  rIhc  additional  Adept  space  includes  these  address  constants  and  other  values  containing 
information  for  the  routine  to  execute  properly,  'Ihe  object  code  for  Adept  is  based  upon  a  1  +  1  pass 
compiler  developed  by  S.  Wakefield.  I'hc  Adept  system  has  a  simple  mn  time  environment  required  for  a 
small  stand  alone  system. 

The  pdp-11  figures  are  based  upon  the  Pascal  compiler  developed  at  Vrijc  University  (the  Netherlands), 
Pascal-1 VU.  It  produces  an  intermediate  program  representation  EM-1,  developed  by  A.  Tannenbaum,  which 
it  further  translates  into  pdp-11  code. 

dhe  static  program  size  is  the  size  of  the  instruction  stream;  however,  in  the  pdp-11  architecture  many 
of  the  data  parameters  arc  represented  as  immediate  data  in  the  instruction  stream.  The  number  of  computed 
data  references  w'as  not  tabulated  for  the  pdp-11  because  of  the  nature  of  die  architecture — the  pdp-11 
architecture  is  heavily  oriented  towards  conditional  interpretation  of  successor  syllables.  Overlapping  c, 
instructions  and/or  syllabic  interpretation  would  be  an  extremely  difficult  process  at  best;  therefore  the  total 
amount  of  sequentiality  in  the  instruction  stream  is  not  a  measure  of  the  architecture’s  responsiveness  to  a 
particular  example,  '[he  additional  number  of  environments  encountered  in  he  pdp-11  object  code  is  a  direct 
result  of  the  write  commands  in  die  subroutine.  These  are  implemented  in  die  object  code  as  calls  to  the 
operating  system. 

Before  discussing  die  P-code  or  370  architectures  the  computation  of  die  computed  data  references 
should  be  examined,  dhcre  is  no  problem  in  counting  the  occurrence  of  array  elements  in  the  source  code  or 
in  fact  in  a  close  surrogate  to  source  code  such  as  the  Adept  code.  In  more  traditional  machines  the 
occurrence  of  a  computed  data  reference  is  a  more  murky  event.  It  manifests  itself  as  the  loading  of  an  index 
register  or  an  indexed  value  in  one  instruction,  and  the  use  in  the  immediately  following  instruction  of  that 
value  as  a  parameter  in  an  address  computation.  In  a  stack  machine  such  as  die  P-codc  processor  a  value  is 
placed  on  a  stack  which  is  used  in  the  next  instruction  as  an  address.  In  System  370  the  following  is  a  typical 
instance: 

LR  15,  A( 13 , 14 )  instruction  i 

LR  2,  B( 15 , 10 )  instruction  i+1 

In  the  above,  a  value  is  loaded  into  a  register  (15  in  this  ease)  and  used  as  an  index/base  value  in  die 
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immediately  following  instruction.  The  result  of  this  event  is  a  potential  ’‘break"  in  the  pipeline.  The  address 
of  the  operand  in  instruction  i  +  1  cannot  be  computed  until  instruction  i  is  fully  executed.  'Phis  delay  is 
reflected  as  an  increased  execution  time  for  instruction  i+1  in  overlapped  machines,  especially  where  this 
deferred  data  access  is  in  the  vicinity  of  a  conditional  branch. 

In  computing  the  number  of  environments,  we  count  the  number  of  different  environment  change 
indicators,  such  as  the  Branch  and  Link  (BAL)  instruction  (System  370).  The  called  routine  while  outside  the 
object  code  listing  may  call  other  environments  and  arc  not  accounted  for  in  our  counts,  a  fact  that  we  shall 
discuss  shortly. 

The  P-code  machine  is  actually  a  surrogate  for  the  Pascal  language.  It  is  a  stack  oriented  machine  and 
meant  to  be  a  transportable  media  for  Pascal  programs.  Any  host  machine  can  compile  into  P-codc  from  a 
Pascal  source  and  (in  theory  at  least)  another  machine  equipped  with  a  P-code  interpreter  can  execute  this 
compiled  code.  The  emphasis  for  most  P-code  compilers  is  rapid  compilation;  thus  the  P-code  statistics  arc 
derived  from  a  non-optimized  compiler — much  the  same  in  philosophy  as  Adept. 

The  large  number  of  dynamic  instruction  occurrences  for  P-codc  when  compared  to  Adept  -  o\cr  5  to 
1 — is  largely  accounted  for  by  the  push  and  pop  instructions  inherent  in  a  stack  machine.  Nonce  that  the 
dynamic  number  of  P-code  branches,  for  example,  is  less  than  a  factor  of  two  to  one  over  the  Cl  measure 

In  comparing  the  System  370  to  any  of  the  other  environments  one  is  faced  vv  ith  immense  problems  S<  * 
far  we  have  been  discussing  machines  and  measures  in  very  limited  run  time  environments  with,  tel.it: .cL 
minimal  generality  in  support  for  non-Pascal  system  facilities,  the  antithesis  of  the  ccnoufi/od  '  ; 

provided  by  the  370  Operating  System.  While  the  370  program  si/c  itself  is  3056  bits  tins  cw  ludo  pn*!.  g. 
epilog  and  data  space  which  alone — through  a  standard  interface— is  reserved  at  16000  bits  Hm  mo.  whelms 
our  comparison  and  since  it  contains  or  allows  for  a  great  deal  more  inform  it  ion  handling  and 
communications  than  required  either  in  this  program  or  by  any  of  the  other  architectures,  we  large!)  eliminate 
(insofar  as  possible)  instructions  or  data  areas  which  are  not  specifically  associated  with  the  program 
hardshuffle.  ITie  column  labelled  "without  linkage”  represents  the  additional  number  of  instructions  in  the 
minimum  linkage  path  between  the  two  routines.  Included  from  thisarc  the  instructions  executed  as  part  of 
the  linkage  which  arc  calls  to  common  run  time  facilities,  space  allocation,  etc.  These  are  again  excluded  in 
our  comparison  since  it  seemed  to  us  that  the  inclusion  of  such  data  is  more  a  measure  of  run  lime  philosophy 
and  its  generality  than  a  measure  of  archiccturc  itself.  Calls  to  such  facilities  during  routine  entry  arc  not 
counted  in  the  environment  counts  cither.  To  fully  include  all  instructions  executed  in  a  typical  System  370 
program  plus  t !!  data  areas  and  prolog  and  epilog  areas  would  increase  the  cited  numbers  by  several  times. 
Thus,  the  370  numbers  can  be  interpreted  as  minimum  numbers  in  comparing  with  the  other  architectural 


figures.  The  370  numbers  reflect  an  estimate  of  the  measures  of  the  architecture  in  a  very  simple  dedicated 
runtime  environment  which  simply  is  not  available  to  us  to  measure.  As  a  further  experiment  on  370  the 
hardshuffle  source  program  was  recoded  in  PI 71  and  recompiled  using  an  optimizing  PL/I  compiler.  The 
increased  generality  of  PL/I  plays  a  role  in  limiting  the  compiler's  ability  to  optimize  the  program.  The 
additional  environments  introduced  in  the  PL/1  version  of  the  program  result  from  the  compiler  causing 
several  environment  changes  per  source  write  command.  | 

It  is  interesting  to  note  that  at  least  for  this  example  the  more  dramatic  variations  in  architectural 
measures  occur  in  parameters — such  as  space,  dynamic  instruction  count,  and  syllables  interpreted — that 
affect  simpler  hosts,  particularly  partially  mapped  and  well  mapped  machines.  Parameters  such  as  stability 
and  distance  remain  relatively  invariant  from  the  canonic  measures  over  the  spectrum  of  architectures 
considered.  In  fact,  compilers  seem  to  play  a  more  significant  role  than  the  architectural  arrangements 
themselves.  This  supports  the  observation  that  is  more  or  less  a  truism  that  compiler  technology  is  even  more 
important  than  the  architecture  as  the  interpreter  and  executor  technology  is  enhanced,  while  for  simpler 
interpreters  (hosts)  the  architecture  seems  to  play  a  dominate  role  in  determining  execution  performance. 
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